11 research outputs found
Efficient Decomposed Learning for Structured Prediction
Structured prediction is the cornerstone of several machine learning
applications. Unfortunately, in structured prediction settings with expressive
inter-variable interactions, exact inference-based learning algorithms, e.g.
Structural SVM, are often intractable. We present a new way, Decomposed
Learning (DecL), which performs efficient learning by restricting the inference
step to a limited part of the structured spaces. We provide characterizations
based on the structure, target parameters, and gold labels, under which DecL is
equivalent to exact learning. We then show that in real world settings, where
our theoretical assumptions may not completely hold, DecL-based algorithms are
significantly more efficient and as accurate as exact learning.Comment: ICML201
Algorithms for structural learning with decompositions
Structured prediction describes problems which involve predicting multiple output variables with expressive and complex interdependencies and constraints. Learning over expressive structures (called structural learning) is usually time-consuming as exploring the structured space can be an intractable problem. The goal of
this thesis is to present different techniques for structural learning, which learn by decomposing the problem space into simpler and tractable components. We will consider three different settings: fully supervised, unsupervised
and semi-supervised, and discriminative latent variable setting, and present learning techniques for each case.
For supervised structural learning, we describe a paradigm called Decomposed Learning (DecL) which
decomposes the inference procedure during learning into small inference steps using additional application specific information. For unsupervised learning, we propose a family of Expectation Maximization [Dempster et al., 1977] algorithms called Unified Expectation Maximization (UEM) [Samdani et al., 2012a] that covers several seemingly divergent versions of EM e.g. hard EM. To efficiently add domain-specific declarative
constraints into learning, we use a dual projected subgradient ascent algorithm which naturally decomposes
the task into simpler components. In the discriminative latent variable scenario, we present a supervised
latent variable model for clustering called the Latent Left-Linking Model (L3M) that can efficiently cluster
items arriving in a streaming order. We decompose the learning process for L3M into small and efficient stochastic gradient descent steps that lead to rapid convergence
Learning multi-linear representations of distributions for efficient inference
Abstract We examine the class of multi-linear representations (MLR) for expressing probability distributions over discrete variables. Recently, MLR have been considered as intermediate representations that facilitate inference in distributions represented as graphical models. We show that MLR is an expressive representation of discrete distributions and can be used to concisely represent classes of distributions which have exponential size in other commonly used representations, while supporting probabilistic inference in time linear in the size of the representation. Our key contribution is presenting techniques for learning bounded-size distributions represented using MLR, which support efficient probabilistic inference. We demonstrate experimentally that the MLR representations we learn support accurate and very efficient inference. Keywords Learning probability distributions 路 Multi-linear polynomials 路 Probabilistic inference 路 Graphical model
Unified expectation maximization
We present a general framework containing a graded spectrum of Expectation Maximization (EM) algorithms called Unified Expectation Maximization (UEM.) UEM is parameterized by a single parameter and covers existing algorithms like standard EM and hard EM, constrained versions of EM such as Constraint-Driven Learning (Chang et al., 2007) and Posterior Regularization (Ganchev et al., 2010), along with a range of new EM algorithms. For the constrained inference step in UEM we present an efficient dual projected gradient ascent algorithm which generalizes several dual decomposition and Lagrange relaxation algorithms popularized recently in the NLP literature (Ganchev et al., 2008; Koo et al., 2010; Rush and Collins, 2011). UEM is as efficient and easy to implement as standard EM. Furthermore, experiments on POS tagging, information extraction, and word-alignment show that often the best performing algorithm in the UEM family is a new algorithm that wasn鈥檛 available earlier, exhibiting the benefits of the UEM framework.